Types of Semantic Information Necessary in a Machine Translation Lexicon

نویسنده

  • David Mowatt
چکیده

This paper describes research undertaken into assessing what types of semantic information (SI) are needed in a Machine Translation (MT) lexicon in order for ‘good’ translation quality to be attainable. We present a typology of semantic information, allowing the use of semantics in any MT system to be quantified in precise and absolute, rather than relative, terms. This typology was used to survey the SI present in twenty commercial and research MT systems. An automatically translated corpus was analysed to identify which types of semantics were necessary to achieve high quality translation. The survey and the analysis allowed us to conclude that four of the nine types of SI identified should always be included and that a further two complex SI types should be considered for inclusion pending further analysis. A formal lexicon specification incorporating these six SI types is presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of English-Persian Translation of Neural Google Translation

Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...

متن کامل

Boosting Lexical Resources for the Semantic Web: Generative Lexicon and Lexicon Interoperability

Computational lexicons can play a key role in the Semantic Web: aiming at making word content machine-understandable, they intend to provide an explicit representation of word meaning, so that it can be directly accessed and used by computational agents, such as large-coverage parsers, modules for intelligent Information Retrieval or Information Extraction. In all these cases, semantic informat...

متن کامل

Bilingual FrameNet Dictionaries for Machine Translation

This paper describes issues surrounding the planning and design of GermanFrameNet (GFN), a counterpart to the English-based FrameNet project. The goals of GFN are (a) to create lexical entries for German nouns, verbs, and adjectives that correspond to existing FrameNet entries, and (b) to link the parallel lexicon fragments by means of common semantic frames and numerical indexing mechanisms. G...

متن کامل

A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation

In this work we give a detailed comparison of the impact of the integration of discriminative and trigger-based lexicon models in state-ofthe-art hierarchical and conventional phrasebased statistical machine translation systems. As both types of extended lexicon models can grow very large, we apply certain restrictions to discard some of the less useful information. We show how these restrictio...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999